Sharing Models and Tools for Processing German Clinical Texts

نویسندگان

  • Johannes Hellrich
  • Franz Matthies
  • Erik Faessler
  • Udo Hahn
چکیده

The automatic processing of non-English clinical documents is massively hampered by the lack of publicly available medical language resources for training, testing and evaluating NLP components. We suggest sharing statistical models derived from access-protected clinical documents as a reasonable substitute and provide solutions for sentence splitting, tokenization and POS tagging of German clinical texts. These three components were trained on the confidential FRAMED corpus, a non-sharable collection of various German-language clinical document types. The models derived therefrom outperform alternative components from OPENNLP and the Stanford POS tagger, also trained on FRAMED.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normalizing Medieval German Texts: from rules to deep learning

The application of NLP tools to historical texts is complicated by a high level of spelling variation. Different methods of historical text normalization have been proposed. In this comparative evaluation I test the following three approaches to text canonicalization on historical German texts from 15th–16th centuries: rule-based, statistical machine translation, and neural machine translation....

متن کامل

How to improve information extraction from German medical records

Vast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated an essential prerequisite to which is information extraction from...

متن کامل

(German) Language Processing for Lucene

This paper introduces an open-source Java-package called German Language Processing for Lucene (glp4lucene). Although it was originally developed to work with German texts, it is to a large degree language independent. It aims at facilitating four language processing steps for working with non-English texts and Apache Lucene/Solr: lemmatizing words, weighting terms based on their part-of-speech...

متن کامل

A POS Tagger for Social Media Texts Trained on Web Comments

Using social media tools such as blogs and forums have become more and more popular in recent years. Hence, a huge collection of social media texts from different communities is available for accessing user opinions, e.g., for marketing studies or acceptance research. Typically, methods from Natural Language Processing are applied to social media texts to automatically recognize user opinions. ...

متن کامل

A Nonlinear Model of Economic Data Related to the German Automobile Industry

Prediction of economic variables is a basic component not only for economic models, but also for many business decisions. But it is difficult to produce accurate predictions in times of economic crises, which cause nonlinear effects in the data. Such evidence appeared in the German automobile industry as a consequence of the financial crisis in 2008/09, which influenced exchange rates and a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Studies in health technology and informatics

دوره 210  شماره 

صفحات  -

تاریخ انتشار 2015